Data-driven perceptually based join costs
نویسندگان
چکیده
Concatenative speech synthesis systems attempt to minimize audible discontinuities between two successive concatenated units. In unit selection concatenative synthesis, a join cost is calculated that is intended to predict the extent of audible discontinuity introduced by the concatenation of two specific units. A study was conducted that used human perceptual data on the detectability of mid-vowel concatenation discontinuities to train and to test several models for predicting perceptually-based join costs. Both linear regression (LR) and classification and regression tree (CART) models were used. Each was trained on several different sets of predictor variables. All LR and some CART models used strictly acoustic predictor variables, some CART models used acoustic plus phonetic categorical variables, and one CART model used strictly phonetic predictors. Results from tests of LR and CART models showed that, when trained with the same acoustic predictor variables, the two models achieved very similar results in predicting human detection rates. Euclidean cepstral distances were superior to VQ cepstral distances as predictor variables. Categorical phonetic predictor variables in CART models greatly improved the accuracy of prediction of concatenation discontinuities.
منابع مشابه
Perceptually-based data-driven join costs: comparing join types
Unit selection synthesis has improved the quality of synthetic speech by making it possible to concatenate speech from a large database to produce intelligible synthesis while preserving much of the naturalness of the original signal. Such synthesis is by no means perfect, however, and this paper describes work to achieve more optimal joins between concatenated units. Results from a psychoacous...
متن کاملPerceptually-based Data-driven Join Co
Unit selection synthesis has improved the quality of synthetic speech by making it possible to concatenate speech from a large database to produce intelligible synthesis while preserving much of the naturalness of the original signal. Such synthesis is by no means perfect, however, and this paper describes work to achieve more optimal joins between concatenated units. Results from a psychoacous...
متن کاملTime Driven Activity Based Costing : Theory,Applications and Limitations
The aim of this study is to explore the strategic applications and limitations of Time-driven Activity-based Costing (TDABC) and to evaluate the degree of accuracy of the proponents’ arguments concerning its usefulness. In this study, published works directly related to this area from the period 2004-2015 are analyzed. This study reports TDABC's applications in strategic areas such as cost of p...
متن کاملA classifier-based target cost for unit selection speech synthesis trained on perceptual data
Our goal is to automatically learn a perceptually-optimal target cost function for a unit selection speech synthesiser. The approach we take here is to train a classifier on human perceptual judgements of synthetic speech. The output of the classifier is used to make a simple three-way distinction rather than to estimate a continuously-valued cost. In order to collect the necessary perceptual d...
متن کاملA Model-Driven Decision Support System for Software Cost Estimation (Case Study: Projects in NASA60 Dataset)
Estimating the costs of software development is one of the most important activities in software project management. Inaccuracies in such estimates may cause irreparable loss. A low estimate of the cost of projects will result in failure on delivery on time and indicates the inefficiency of the software development team. On the other hand, high estimates of resources and costs for a project wil...
متن کامل